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Abstract 

Background: Verotoxigenic £ coli (VTEC) is the cause of severe gastrointestinal infection especially among infants. 
Between 10 and 20 cases are reported annually to the National Infectious Disease Register (NIDR) in Finland. The 
aim of this study was to identify explanatory variables for VTEC infections reported to the NIDR in Finland between 
1997 and 2006. We applied a hurdle model, applicable for a dataset with an excess of zeros. 

Methods: We enrolled 131 domestically acquired primary cases of VTEC between 1997 and 2006 from routine 
surveillance data. The isolated strains were characterized by virulence type, serogroup, phage type and pulsed-field 
gel electrophoresis. By applying a two-part Bayesian hurdle model to infectious disease surveillance data, we were 
able to create a model in which the covariates were associated with the probability for occurrence of the cases in 
the logistic regression part and the magnitude of covariate changes in the Poisson regression part if cases do 
occur. The model also included spatial correlations between neighbouring municipalities. 

Results: The average annual incidence rate was 4.8 cases per million inhabitants based on the cases as reported to 
the NIDR. Of the 131 cases, 74 VTEC 0157 and 58 non-0157 strains were isolated (one person had dual infections). 
The number of bulls per human population and the proportion of the population with a higher education were 
associated with an increased occurrence and incidence of human VTEC infections in 70 (17%) of 416 of Finnish 
municipalities. In addition, the proportion of fresh water per area, the proportion of cultivated land per area and 
the proportion of low income households with children were associated with increased incidence of VTEC 
infections. 

Conclusions: With hurdle models we were able to distinguish between risk factors for the occurrence of the 
disease and the incidence of the disease for data characterised by an excess of zeros. The density of bulls and the 
proportion of the population with higher education were significant both for occurrence and incidence, while the 
proportion of fresh water, cultivated land, and the proportion of low income households with children were 
significant for the incidence of the disease. 



Background 

Verotoxigenic E. coli (VTEC) is a highly infectious cau- 
sative agent for severe gastrointestinal infection with 
haemorrhagic diarrhoea, which can lead to haemolytic 
uraemic syndrome (HUS) or thrombocytopaenic pur- 
pura (TTP) [1]. Most of the diagnosed VTEC infections 
in developed countries, including Finland, are of the ser- 
ogroup 0157, but an increasing number of cases are 
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found with non-0157 serogroups 026, Olll, 091, 
O103 and 0145 [2,3]- The majority of VTEC infections 
occur among young children, predominantly during the 
summer months [2-4]. VTEC infection is rare in Fin- 
land, with only 10-20 cases reported annually to the 
National Infectious Disease Register (NIDR) [5]. 

Several risk factors for VTEC infections have been 
identified from outbreak data, modelling studies, and 
descriptive and analytical epidemiological studies. Cattle 
are considered the main reservoir for VTEC [6-8]. Liv- 
ing, visiting or working in close proximity to a farm 
with cattle or having a household contact working in 
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such a farm have been found to be important risk fac- 
tors [9-11]. Cattle density or VTEC-positive cattle den- 
sity have been established as important explanatory 
variables for VTEC infections [4,12-16] or cases of HUS 
[16]. Also farm density per area of land [14] and animal 
manure application to soil are important [13]. Under- 
cooked beef meat or meat products, water, vegetables 
and unpasteurized milk, ice cream, juice and cider can 
be sources of VTEC infections [17], and consumption of 
burgers and cold meat cuts are established risk factors 
[9,18]. Pathogenic strains of VTEC have been detected 
in minced beef meat, beef meat cubes and cuts, raw cat- 
tle, sheep and goat milk and sprouts [11]. Lake water or 
contaminated municipal water systems have caused out- 
breaks of VTEC 0157 [17,19,20] and consumption of 
untreated surface water is a risk factor for non-0157 
[18]. 

Poisson distribution based regression models are stan- 
dard models for count data, including for infectious dis- 
ease surveillance data [21]. However, if the data has an 
excess of zeros, the assumptions of the Poisson distribu- 
ted models are no longer valid. Models based on nega- 
tive binomial distribution may slightly correct for this 
problem, but when there is a severe inflation of zeros, 
more specific models are needed. Zero-inflated and hur- 
dle models have been developed to overcome this pro- 
blem [21,22]. With the use of a complementary log-log 
link function, the hurdle model reduces to a standard 
Poisson regression if the data is Poisson distributed [22]. 
These types of models have been applied recently for 
modelling sleeping sickness [23], cholera prevalence [24] 
and bacterial counts [25]. 

The aim of this study was to identify explanatory vari- 
ables for VTEC infections reported to the NIDR in Fin- 
land between 1997 and 2006. VTEC infection is a rare 
disease in Finland and we used Poisson distribution 
based hurdle models with variable selection based on in- 
depth questionnaires and known risk factors. We chose 
the hurdle model based on its ability to handle an over- 
dispersion of zeros and applicability to scarce data. The 
incidence rate of VTEC infection at municipality level 
was used as an outcome variable and explanatory vari- 
ables included various agricultural, socioeconomic and 
environmental factors. These explanatory variables also 
included several factors that have not been studied with 
VTEC infections previously. The null hypothesis that 
the populations in different municipalities in Finland are 
at the same risk of contracting VTEC infection was 
tested using a Poisson distribution based hurdle model. 

Methods 

Microbiological characterisation 

The presence, isolation and biochemical identification of 
VTEC bacteria from stool cultures was done as 



previously described [7]. The isolates were O grouped as 
previously described [7] with the specific O antisera 
obtained from THL (National Institute for Health and 
Welfare), Statens Serum Institute (SSI, Denmark) and 
Oxoid (Oxoid, Basingstoke, UK). The isolates of the 
0157 strains were phage typed as previously described 
[2]. Chromosomal DNA of the isolates was character- 
ized by the internationally standardized pulsed-field gel 
electrophoresis (PFGE) method [7], 

Definitions for the microbiologically confirmed VTEC 
cases 

Domestic case 

A case of VTEC infection with no history of foreign tra- 
vel during the two weeks prior to onset of symptoms. 
Index case 

The VTEC case with "earliest onset of symptoms"- date 
or isolation date within a household or outbreak. 
Household case 

A VTEC case living in the same household with the 
index case. 
Outbreak case 

Two or more epidemiologically and microbiologically 
linked VTEC cases caused by strains of identical subtype 
and not of the same household. 
Sporadic case 

A VTEC case with no epidemiological or microbiologi- 
cal link to another VTEC case. 
Primary case 
Index or sporadic case. 

Surveillance of VTEC in Finland 

For all cases reported to the NIDR since 1997, date of 
birth, sex, place of residence, the earliest reported date, 
sampling date, travel information, and death notification 
were reported. An in-depth interview as a part of the 
surveillance has been conducted since 1998 [2], with the 
questionnaire reformulated in 2003. The interviews were 
conducted either by THL or the local health district. 
The information and question types that differed 
between the questionnaires were either used in conjunc- 
tion or information was combined from the two forms. 
However, in some cases data were insufficient for a full 
analysis of the in-depth questionnaires. The study mate- 
rial is part of routine enhanced surveillance carried out 
in Finland. The data presented does not include any 
individual data so that any person could be identified. 
The interviewed case patients have given informed con- 
sent for the surveillance. The data was accessed from 
the NIDR based on the justification on the infectious 
disease law (583/1986) and regulation (786/1986). 

The information that was available from both forms or 
could be obtained by combining information included: 
Place of residence (postal code), attending school or day 
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care and having VTEC positives in the same household. 
Clinical information included presence of diarrhoea, 
visible blood in the stool, HUS or TTP as a complica- 
tion, admittance to a hospital, and death. Certain risk 
factors were asked in one or both forms, including living 
or visiting farms with domestic animals, consumption of 
raw or undercooked meat, unpasteurized milk or 
untreated water, and swimming in natural water. 

Modelling 

As the infectious disease surveillance data is count data 
and there was an excess number of zeros in the data 
(mean number of cases 0.31 per municipality with var- 
iance of 1.2), we developed a Bayesian Poisson comple- 
mentary-log log (clog log) hurdle model [22,26] with a 
spatial correlation factor for taking into account the 
neighbouring municipalities [27,28]. If a Poisson or 
negative binomial distribution based count model is 
applied to data with an excess of zeros without addres- 
sing these mixtures, the model can be strongly affected. 
The hurdle model was chosen due to its ability to han- 
dle the over-dispersion of zeros. In the zero-inflated 
models the count of zeros is mixed into two compo- 
nents, while in the hurdle models the zero and positive 
counts are modelled by separate processes. Zero-inflated 
models are designed to handle excessive zero counts, 
while hurdle models can also be adjusted for too few 
zeros. As the data were quite scarce and there was no 
need to distinguish between structural and sampling 
zeros, the hurdle model was chosen for the present 
study [21,22]. The hurdle model is composed of two 
parts, a binary model (logistic regression) generating 
positive counts versus zero counts, and a zero truncated 
Poisson model generating positive counts only. In a hur- 
dle model, the data is considered as being generated by 
a process that generates positive counts only after clear- 
ing a zero barrier, or hurdle. Until the hurdle is cleared, 
the process generates a binary response and the covari- 
ates associate with the probability for occurrence of the 
cases in the logistic regression part. After the hurdle is 
cleared, the process generates only a positive counts 
response and the covariates estimate the magnitude for 
the incidence of VTEC cases. We used a Bayesian full- 
likelihood approach to modelling the data, taking into 
account also the missingness in the covariate data (using 
imputation and missingness indicators). 

Domestic primary cases of VTEC were included in the 
analysis as outcome variables for the 10-year follow-up 
period. These were assigned with home municipality 
according to information on the in-depth questionnaires 
or NIDR information using a municipality division of 
416 municipalities in Finland in 2007. The explanatory 
variables for modelling included various agricultural, 
socioeconomic and environmental factors. The results 



from the descriptive epidemiology of the present study 
and literature were used to identify potential variables to 
be included into the model. We performed a univariable 
analysis in the hurdle model with all potential explana- 
tory variables and in the multivariable analysis we tested 
those variables that had p-values < 0.20 in the univari- 
able analysis. Of the correlated variables, only the most 
significant ones and one per correlation group were 
included in the final model within each group, as 
marked in Additional file 1. Correlation coefficients 
within each group (1-5) were significant and > 0.80. To 
identify explanatory variables in the multivariable model, 
Gibbs' variable selection was used [29] with all explana- 
tory variables as listed in Additional file 1. Those vari- 
ables with a posterior inclusion probability of > 0.5 in 
the Gibbs' variable selection were included in the final 
model. The significant variables were tested with cross 
validation in the univariable analysis to identify influen- 
tial values. For details of the model, see Additional file 2 
and Additional file 3. 

The statistical packages used included R (version 
2.11.1), SPSS (version 18), Arcgis (version 9.3) (for data 
pre-processing), WinBUGS (version 1.4.3) and Stata 
(version 9.2). The hurdle models are described in the 
Additional file 2 and Additional file 3. 

Results 

Description of the cases 

Between 1997 and 2006, 247 cases of microbiologically 
confirmed VTEC infections were reported to the NIDR 
in Finland. There was an average annual incidence rate 
of 4.8 (range 1.9-12.0) per million inhabitants and 25 
(10-62) cases on average were identified annually. Of all 
cases, 46 cases (19%) had a history of foreign travel; 
these cases were excluded from further analyses. Out- 
break cases accounted for 29 (12%) and household cases 
for 77 (31%) of all the cases, and of these, 36 domestic 
index cases were enrolled. Altogether 131 domestically 
acquired primary cases were enrolled. Of the 131 cases, 
95 (73%) cases were sporadic. 

Of all 247 cases, 249 strains were isolated from pri- 
mary stool cultures of patients or their close contacts 
with VTEC infection. Two patients had a dual infection 
caused by two VTEC strains. Of the 131 domestically 
acquired primary cases enrolled in this study, 132 VTEC 
strains were isolated: 74 (56%) were serogroup 0157 
and 58 (44%) were of the non-0157 serogroup; one case 
had dual infections of both serogroups. Of the 0157 
strains, 60 (81%) typed to sorbitol negative and 14 (19%) 
to the sorbitol positive 0157 group. Within the 0157 
group, the most predominant phage types (PT) were 
PT2, PT88 or its variant, PT8 and PT4. The genomic 
DNA of the strains gave 33 and 50 different PFGE pat- 
terns for the VTEC 0157 and non-0157 strains, 
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respectively. The non-0157 strains typed into 20 ser- 
ogroups, the most common O groups being O103, 
0145 and 026. 

Of the 131 primary cases, 51 (39%) were aged younger 
than 5 years and 23 (18%) were older than 40 years. 
Half (66, 50%) were male, with the gender distribution 
also being equal in the age groups (data not shown). 
Interviews were conducted for 83 (63%) cases. The post- 
code for the home address was known for 25 (30%) of 
the cases. Secondary household VTEC positives were 
reported in 24 cases (29%). 

The symptoms were typical of VTEC infection: 74 
(94%) had diarrhoea and 69 (90%) visible blood in the 
stools. HUS developed in 21 (28%) cases, and of these, 5 
(7%) had TTP; in total, 61 (74%) were admitted to a 
hospital, and none of the cases died. Information on the 
reported risk factors was scarce: of the 37 cases aged 0- 
6 years, 8 (22%) attended day care. Living or visiting a 
farm with domestic animals was reported in 17 (22%) 
cases. Unpasteurized milk products were consumed in 
16 (20%) cases, raw or undercooked meat was consumed 
in 17 (24%) cases. Consumption of untreated water was 
reported in 6 (17%) and swimming in natural water in 3 
(8%) cases. 

Modelling of the data 

The modelling unit was the municipality and between 1 
and 11 cases were identified in 70/416 (17%) municipali- 
ties (Figure 1). Of all the cattle, socioeconomic and 
environmental variables, 17 variables were included in 
the multivariable models (Additional file 1). 

In the multivariable model, bulls per human population 
and level of higher education per population were signifi- 
cant in the zero part of the hurdle models, indicating the 
importance of these variables for the occurrence of the 
infection (Table 1). The number of bulls per human popu- 
lation, the number of farms under cultivation per area, the 
proportion of fresh water to total surface area, the level of 
higher education per population and the proportion of 
low income households with children were positively sig- 
nificant in the zero-truncated Poisson of the hurdle mod- 
els, indicating significance relating to the incidence of 
VTEC cases. The zero part of the hurdle model should be 
interpreted so that 1% change in bulls per population in a 
certain municipality or 1% change in level of higher educa- 
tion per population increases the approximated odds for 
the occurrence of VTEC by exp(0.0r 5 p bulls /1.386)-l = 4% 
and exp(0.0r e p educ /1.386)-l = 4%, respectively. Likewise, 
the 1% change in bulls, farms, fresh water, educational 
level and in low income households in the Poisson part 
increases the standardised incidence ratio (SIR) by exp 
(0.01*p buUs ) = 5%, exp(0.01*p farms ) = 2%, exp(0.01*p fresh _. 

water) = 2%, eXp(0.01*P educ ) = 6%, eXp(0.01*Pl ow income) = 

7%, respectively [30]. 




Figure 1 Incidence rate of sporadic cases of VTEC infections 
per million inhabitants in Finland by municipality 1997-2006. 



Interestingly, using a cross validation procedure in the 
univariable or multivariable hurdle models, several more 
influential values were found. When two influential 
observations (in both zero and Poisson parts of the 
model) were removed from the data, the fresh water 
variable became negatively significant. When examining 
these municipalities in more detail, it became plausible 
that there may have been prolonged, undetected out- 
breaks. One of these municipalities was a medium size 
city with nine cases, with six cases during one calendar 
year and of those, four had identical PFGE profiles. The 
other was a small municipality with a population of just 
over 10 000 that had four cases during a two-year per- 
iod with different PFGE patterns. Furthermore, these 
municipalities were among those with the highest pro- 
portion of fresh water, suggestive of water borne out- 
breaks. Also the number of cattle was high in these 
municipalities. 

Discussion 

VTEC infection has a low average annual incidence rate 
of 4.8 per million inhabitants in Finland. The cases 
described in the present study were predominantly chil- 
dren with typical symptoms of VTEC infection: Most 
cases had haemorrhagic diarrhoea and some cases of 
HUS were identified. Half of the cases were infected 
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Table 1 Final multivariable hurdle models, with and without the spatial correlation variable 



vallaule [ ) 


Hurdle model with spatial correlation 
variable 

truncated Poisson 


zero model 


Hurdle model 

truncated 

Poisson 


zero model 


Bulls per population 


5.0 (2.2-9.1) 


5.0 (1.3-9.0) 


5.1 (2.2-9.3) 


4.8 (1.1-8.7) 


Proportion of adult population with middle education 


5.6 (2.0-9.1) 


5.0 (2.3-8.1) 


5.8 (1.9-10.7) 


4.6 (1.5-7.7) 


or higher 










Proportion of low income households with children 


7.1 (1.9-12.4) 


-(ns, 


7.2 (1.9-12.3) 


-(ns, 






excluded) 




excluded) 


Number of farms under cultivation per square km 


2.0 (0.7-3.2) 


-(ns, 


2.1 (0.5-3.3) 


-(ns, 






excluded) 




excluded) 


Fresh water per total surface area 


1 .9 (0.4-3.3) 


-(ns, 


1 .9 (0.4-3.4) 


-(ns, 






excluded) 




excluded) 



(*) Parameter estimates (with 95% equal tail credible intervals) 



with 0157 serogroup strains. Detailed microbiological 
analysis including genotyping was essential to distin- 
guish between outbreak and non-outbreak strains. 
Approximately a third of cases had a household contact 
who was VTEC infection positive, indicating direct 
transmission of the infection. Many of the cases 
reported visits to farms or consuming undercooked 
meat or unpasteurized milk; however, it was difficult to 
assess the impact of the risk factors obtained from the 
in-depth questionnaires as the information was scarce. 
Bias due to under-reporting of the cases is considered 
to be low for VTEC; however, diagnostic differences 
may occur between health care districts. 

Hurdle modelling has not yet been widely used on 
infectious disease surveillance data [23,25,31]. One rea- 
son for using the hurdle model in the present study was 
the fact that the variance was much higher than the 
mean in the observed number of cases per municipality. 
The data was also characterised by an excess of zeros. 
As the zero and the truncated Poisson parts of the 
model gave different sets of significant variables, this 
further suggested the need for this type of model. If the 
coefficients of the binary part and the truncated Poisson 
part are close to each other, the hurdle model may be 
replaced by a standard Poisson regression model [22]. 
From the technical point of view, the prior distributions 
used were as weakly informative as possible, which guar- 
anteed the convergence of the simulations. The results 
from the simulations were robust, especially to changes 
in the variances of the spatial factors. We also recom- 
mend using a full-likelihood approach in the multivari- 
able analysis to explain the mechanism for the missing 
values. 

We identified the proportion of beef cattle to human 
population and the proportion of population with higher 
education as being associated with increased occurrence 
and incidence of VTEC infections. The proportion of 
the population with higher education is likely to repre- 
sent an indirect indicator of consumption habits, possi- 
bly eating undercooked beef steak. In addition, the 



number of farms under cultivation per area, fresh water 
per area and the proportion of low income households 
with children were associated with increased incidence 
of the infection. These are likely to represent routes for 
the spread of the infection; the low income households 
may be an indirect indicator of consumption habits. The 
model without the spatial variable essentially gave the 
same results. Our results concur with a US study on 
ecological factors for gastrointestinal infections, where 
the proportion of the population living on a farm (posi- 
tive association), low education (negative association) 
and living in the south (negative association) were the 
most important socioeconomic factors for acquiring a 
VTEC infection [32]. 

The finding of beef cattle as a strong explanatory vari- 
able was not unexpected as cattle have been found to be 
an important risk factor for VTEC infections in epide- 
miological and modelling studies [9-11,13,14]. The 
VTEC prevalence is low in cattle in the Nordic coun- 
tries, including Finland [33,34]. The overall cattle den- 
sity [12] or dairy cattle or calves density [16] have also 
been found be significant, but these differences may 
reflect correlations between the cattle variables, variation 
in agricultural practices, and the ways in which data are 
collected. The significance of beef cattle over the other 
cattle variables may be due to a susceptibility of young 
animals to VTEC, raising the bull cattle in finishing 
units with multiple sources of animals, and the faecal 
contamination of the bulls and the environment during 
the finishing period [35]. The beef cattle density was an 
important factor both for the occurrence of the disease- 
confirming the role of cattle as a reservoir of the infec- 
tion-and the incidence of the infection, indicating that 
contact with cattle may be an important route for 
spreading the infection. A relationship between human 
VTEC infections and VTEC positive cattle density has 
been found [14]. Also indirect cattle variables like the 
proportion of land where manure is applied has been 
found to be significant [13]; in the present study the 
proportion of cultivated land was also a significant risk 
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factor for the incidence of the VTEC infections. Spread- 
ing animal manure on fields is common practice in Fin- 
land, with most of the manure coming from cattle [36]. 

The result for the proportion of fresh water surface 
per area is intriguing. Previous studies suggest drinking 
and swimming water as a source of VTEC infections 
and outbreaks [6,18-20,37]. However, two influential 
values had a strong effect for this variable in the present 
study. It would be interesting to examine the risk factors 
separately for water borne cases; this information was 
not widely available. There is recent evidence that con- 
taminated water is an important route of infection in 
cattle [38] Campylobacter infections have been found to 
have a positive association with water pipe-length per 
population and a negative association with proportion of 
population receiving household water from a public sup- 
ply [39]. It is noteworthy that the proportion of salted 
water was not a significant variable in the multivariable 
analysis. Finland has thousands of small lakes, but it 
also has a long sea coast along the western and southern 
borders. In a Swedish study, it was found that most of 
the VTEC cases lived along the coastline, near lakes or 
along rivers; however, the type of water was not speci- 
fied [14]. VTEC prevalence studies have indicated that 
cases often have a private well or have experienced a 
water failure recently [12,40]. 

The finding that the proportion of the population with a 
higher education and the proportion of low income house- 
holds with children being associated with increased inci- 
dence of the infection is somewhat contradictory. 
However, we feel that these two variables may explain dif- 
ferent phenomena. There is conflicting evidence about 
educational level as a risk factor for VTEC infections 
[32,41]. However, it is known that people with higher edu- 
cation eat raw or undercooked ground beef more fre- 
quently [42], a known risk factor for VTEC infection [9]. 
The proportion of households where the main supporter 
of the household works in agriculture did not remain in 
the final model; however, there was a correlation between 
this and other agricultural variables. There may also be 
immunity: It has been found that in outbreaks, long-term 
exposure can lead to immunity [20] . In other studies, hav- 
ing a household member with an agricultural occupation 
was found to be a risk factor [9]. The most likely risk fac- 
tor found for sporadic VTEC infection is having contact 
with a farm [11]. 

The study was limited by the low number of VTEC 
cases; therefore spatiotemporal analysis was not used. This 
was compensated for by using specific models designed 
for scarce data and performing a purely spatial analysis. 
The seasonal trends could not be evaluated properly due 
to the low number of cases. Another limitation of the 
study was the unavailability of data regarding consumption 
of beef or other food items per municipality. It has been 



found in Scotland that of all VTEC infections, 16% of 
cases had a significant food-related exposure, such as eat- 
ing high-risk foods or having a contact working with meat 
[10]. Considering the VTEC infections in our study and 
the strong evidence of domestically acquired infections, 
transmission via imported food stuff would be plausible, 
based on the similar pheno- and genotypic characteristics 
of the non-sorbitol fermenting, sorbitol fermenting and 
non-0157 strains isolated in Finland and other European 
countries [7,43,44]. 

In conclusion, cattle are the main reservoir for VTEC 
infections as well as being also an important factor for 
the incidence of the disease. In addition, higher educa- 
tion per population was found to be associated both 
with the occurrence and the incidence of the disease, 
implying that food exposures may play a role. The role 
of fresh water and other environmentally related vari- 
ables warrants further studies. Socioeconomic factors 
like low income and educational level expedite the inci- 
dence of VTEC infection. The recommendations for 
control measures for the prevention of VTEC infections 
given by the WHO include good hygiene in animal and 
slaughter process and food retail, consumer education 
on proper handling of foods, and up-to-date interna- 
tional and national legislation and regulation [45]. 

Conclusions 

We enrolled 131 cases of VTEC with 74 VTEC 0157 
and 58 non-0157 strains isolated (one person had dual 
infections). With Poisson hurdle models we were able to 
distinguish between risk factors for the occurrence of 
the disease (zero part of the model) and incidence of 
the disease (zero truncated Poisson part of the model) 
for data characterised by an excess of zeros in the case 
counts. The number of bulls per human population and 
the proportion of population with a higher education 
were identified as being associated with increased occur- 
rence and incidence of human VTEC 0157 and non- 
0157 infections. The proportion of fresh water per area, 
the proportion of cultivated land per area and the pro- 
portion of low income households with children were 
associated with increased incidence of VTEC infections. 
We recommend using a Bayesian framework with impu- 
tation for the missing values of the explanatory variables 
in conjunction with the missingness equations. 

Additional material 



Additional file 1: Variables included in the modelling study and 
their sources. List of agricultural, socioeconomic and environmental 
variables used in the present study. 

Additional file 2: The statistical hurdle model for the VTEC 
infections. The statistical hurdle model in detail used in the present 
study. 
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Additional file 3: Winbugs code for the hurdle model. Winbugs code 
for the model used in the present study. This file can be viewed with 
Winbugs (http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml) 
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